Golang Job: Site Reliability Engineer - Staff

Job added on

Location

Amsterdam - Netherlands

Job type

Full-Time

Golang Job Details

Site Reliability Engineer - Staff

Juniper is changing what’s possible in networking. We’re going beyond building the networks customers expect — we’re building the networks customers deserve. And the world is taking note. But to continue to excel, we have work to do. Change in our industry is accelerating. To power connections and empower change, we need radical thinkers, eternal optimists, and energized personalities. We need people like you.

Success requires big thinking and high-reaching goals. Our culture breeds innovation. Here, you will have the opportunity to take chances and let your ideas grow. You will be supported by thoughtful, inclusive, and accessible leaders. You will have every chance to be a part of the conversation and seize our momentum. Your career will be better for it.

At Juniper, we strive to deliver network experiences that transform how people connect, work and live. We Power Connections, Empower Change, and we do that through our core values Being Bold, Building Trust and Delivering Excellence.

Do you want to solve complex problems and build systems that will change the Internet? Do you want to be part of a company that is on the cutting edge of technology? Do you want to work with a world-class team of engineers?

Juniper's AI Driven Enterprise (AIDE), is seeking a full-time SRE to join our talented team and build high quality technology solutions that revolutionize wireless networks, powered by Artificial Intelligence in the cloud. Mist provides services through SaaS applications to several Fortune 100 and Fortune 500 customers. You'll take ops projects from concept through to launch. You will be responsible for maintaining and improving the company's production environment for rapid scaling and outstanding performance. You will be responsible to help us keep stellar uptime and reliability. The improvements you implement will be felt by the entire organization.

As s Site Reliability Engineer (SRE) at Juniper Networks, you will be responsible for keeping our cloud-based services, streaming frameworks, NoSQL/RDBMS databases and distributed analytical platforms running in multi-cloud environments to deliver unprecedented IT automation and insight into user experiences driven by our AI services over a geographically distributed customers’ networks.

Responsibilities:

  • Build infrastructure as a code using Terraform, Ansible and Kubernetes
  • Manage and performance tune either databases (Postgres, Redis, Cassandra, Elasticsearch) or streaming data pipelines (Kafka, Flink, Storm, Spark frameworks)
  • Manage CICD pipelines, configuration, automation tools for infrastructure provisioning.
  • Write and maintain runbooks for knowledge driven automated processes and bots.
  • Do capacity planning based on performance, usage, and utilization stats.
  • Partner with developers and quality engineering teams to automate the monitoring, alerting, availability and scalability of our applications and systems.
  • Ensure system availability and business continuity by implementing redundant servers/services.
  • Manage after-hours infrastructure updates and maintenance.
  • Proactively research and propose the use of new concepts, processes, technologies, and tools.
  • Proactive monitoring, diagnosis, on-call rotation and resolution of issues in a 24x7 of multi-cloud environment (AWS/GCP), analyze failures and provide support for software engineers to debug production issues across microservices and distributed platforms.
  • Follow SRE best practices and procedures.

Experience required for you to be successful:

  • Follow SRE best practices and procedures.
  • An extensive background in developing and operating large-scale cloud-based distributed applications
  • Direct experience developing/running applications on AWS and Google Cloud.
  • Laser focus and be able to design infrastructure solutions for scalability, reliability, high availability, performance, software maintainability, and operational excellence
  • The ability to "fix the plane while in flight" (not just support greenfield solutions)
  • The ability to prioritize existing technical and infrastructure debt, and experience to build and execute a plan to pay it off

Required skills:

  • Bachelor’s degree in Computer Science or Computer Engineering or equivalent
  • Basic programming skills in Python, Java, or Golang.
  • Understanding of distributed systems.
  • Understanding of data management technologies including relational and non-relational databases.